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Abstract — Non-orthogonal space-time block codes (STBC) from 
cyclic division algebras (CDA) having large dimensions are at- 
tractive because they can simultaneously achieve both high spec- 
tral efficiencies (same spectral efficiency as in V-BLAST for a 
given number of transmit antennas) as well as full transmit di- 
versity. Decoding of non-orthogonal STBCs with hundreds of di- 
mensions has been a challenge. In this paper, we present a prob- 
abilistic data association (PDA) based algorithm for decoding 
non-orthogonal STBCs with large dimensions. Our simulation 
results show that the proposed PDA-based algorithm achieves 
near SISO AWGN uncoded BER as well as near-capacity coded 
BER (within about 5 dB of the theoretical capacity) for large 
non-orthogonal STBCs from CDA. We study the effect of spa- 
tial correlation on the BER, and show that the performance loss 
due to spatial correlation can be alleviated by providing more 
receive spatial dimensions. We report good BER performance 
when a training-based iterative decoding/channel estimation is 
used (instead of assuming perfect channel knowledge) in chan- 
nels with large coherence times. A comparison of the perfor- 
mances of the PDA algorithm and the likelihood ascent search 
(LAS) algorithm (reported in our recent work) is also presented. 

Keywords — Non-orthogonal STBCs, large dimensions, high spec- 
tral efficiency, low-complexity near-ML decoding, probabilistic data 
association. 

I. Introduction 

Multiple-input multiple-output (MIMO) systems that employ 
non-orthogonal space-time block codes (STBC) from cyclic 
division algebras (CDA) for arbitrary number of transmit an- 
tennas, Nt, are quite attractive because they can simultane- 
ously provide both full-rate (i.e., Nt complex symbols per 
channel use, which is same as in V-BLAST) as well as full 
transmit diversity (T). The 2x2 Golden code is a well known 
non-orthogonal STBC from CDA for 2 transmit antennas J2j. 
High spectral efficiencies of the order of tens of bps/Hz can 
be achieved using large non-orthogonal STBCs. For exam- 
ple, a 16 x 16 STBC from CDA has 256 complex symbols 
in it with 512 real dimensions; with 16-QAM and rate-3/4 
turbo code, this system offers a high spectral efficiency of 48 
bps/Hz. Decoding of non-orthogonal STBCs with such large 
dimensions, however, has been a challenge. Sphere decoder 
and its low-complexity variants are prohibitively complex for 
decoding such STBCs with hundreds of dimensions. 

In this paper, we present a probabilistic data association (PDA) 
based algorithm for decoding large non-orthogonal STBCs 
from CDA. Key attractive features of this algorithm are its 
low-complexity and near-ML performance in systems with 
large dimensions (e.g., hundreds of dimensions). While cre- 
ating hundreds of dimensions in space alone (e.g., V-BLAST) 
requires hundreds of antennas, use of non-orthogonal STBCs 
from CDA can create hundreds of dimensions with just tens 
of antennas (space) and tens of channel uses (time). Given 



that 802.11 smart WiFi products with 12 transmit antenna^] 
at 2.5 GHz are now commercially available J4] (which estab- 
lishes that issues related to placement of many antennas and 
RF/IF chains can be solved in large aperture communication 
terminals like set-top boxes/laptops), large non-orthogonal 
STBCs (e.g., 16 x 16 STBC from CDA) in combination with 
large dimension near-ML decoding using PDA can enable 
communications at increased spectral efficiencies of the or- 
der of tens of bps/Hz (note that current standards achieve only 
< 10 bps/Hz using only up to 4 transmit antennas). 

PDA, originally developed for target tracking, is widely used 
in digital communications ll5l- lfT2l . Particularly, PDA algo- 
rithm is a reduced complexity alternative to the a posteriori 
probability (APP) decoder/detector/equalizer. Near-optimal 
performance has been demonstrated for PDA-based multiuser 
detection in CDMA systems 12)- [8). PDA has been used in 
the detection of V-BLAST signals with small number of di- 
mensions |fT0l - lfT2l . To our knowledge, PDA has not been 
reported for decoding non-orthogonal STBCs with hundreds 
of dimensions so far. Our results in this paper can be summa- 
rized as follows: 

• We adapt the PDA algorithm for decoding non-orthogo- 
nal STBCs with large dimensions. With i.i.d fading and 
perfect CSIR, the algorithm achieves near-SISO AWGN 
uncoded BER and near-capacity coded BER (within about 
5 dB of the theoretical capacity) for 12 x 12 STBC from 
CDA, 4-QAM, rate-3/4 turbo code, and 18 bps/Hz. 

• Relaxing the perfect CSIR assumption, we report results 
with a training based iterative PDA decoding/channel es- 
timation scheme. The iterative scheme is shown to be 
effective with large coherence times. 

• Relaxing the i.i.d fading assumption by adopting a spa- 
tially correlated MIMO channel model (proposed by Ges- 
bert et al in ITBl ). we show that the performance loss due 
to spatial correlation is alleviated by using more receive 
spatial dimensions for a fixed receiver aperture. 

• Finally, the performance of the PDA algorithm is com- 
pared with that of the likelihood ascent search (LAS) 
algorithm we recently presented in |[T3l - lfT5l . The PDA 
algorithm is shown to perform better than the LAS al- 
gorithm at low SNRs for higher-order QAM (e.g., 16- 
QAM), and in the presence of spatial correlation. 

II. System Model 

Consider a STBC MIMO system with multiple transmit and 
receive antennas. An (n, p, k) STBC is represented by a ma- 

1 12 antennas in these products are now used only for beamforming. 
Single-beam multi-antenna approaches can offer range increase and inter- 
ference avoidance, but not spectral efficiency increase. 



trix X c e C nxp , where n and p denote the number of transmit 
antennas and number of time slots, respectively, and k de- 
notes the number of complex data symbols sent in one STBC 
matrix. The (i, j)th entry in X c represents the complex num- 
ber transmitted from the ith transmit antenna in the jth time 
slot. The rate of an STBC is -. Let N r and N t = n denote the 
number of receive and transmit antennas, respectively. Let 
H c g C N rXN t denote the channel gain matrix, where the 
(i, j)th entry in H c is the complex channel gain from the jth 
transmit antenna to the ith receive antenna. We assume that 
the channel gains remain constant over one STBC matrix and 
vary (i.i.d) from one STBC matrix to the other. Assuming 
rich scattering, we model the entries of H c as i.i.d CAf(0, 1). 
The received space-time signal matrix, Y c e C^' xp , can be 
written as 

Y C = H C X C + N C , (1) 

where N c € C NrXp is the noise matrix at the receiver and 
its entries are modeled as i.i.d CAf(0, a 2 = tf s ) , where E s 
is the average energy of the transmitted symbols, and 7 is the 
average received SNR per receive antenna Q, and the (i, j)th 
entry in Y c is the received signal at the ith receive antenna in 
the jth time-slot. Consider linear dispersion STBCs, where 
X c can be written in the form O 



Now, (O can be written as 



X c = ]>>«A« 



(2) 



where x£ is the ith complex data symbol, and Ac £ C Nt xp 
is its corresponding weight matrix. The received signal model 
in (fTJi can be written in an equivalent V-BLAST form as 



y c = (H c a«) + n c = H c x c + n c 



(3) 



= vec{Y c ), H c g C N - pxNtP = (I ® 
wec(A c 4) ),n c g C N - pxl = wec(N c ), 



where y c G C N '' pxl 
H c ),a c 4) G C Ntpxl 
x c G C fexl whose ith entry is the data symbol Xc , and 
H c G C NrPXk whose ith column is H c a c °, i = 1, 2, • • • , k. 
Each element of x c is an M-PAM/M-QAM symbol. Let y c , 
H c , x c , n c be decomposed into real and imaginary parts as: 

y c = yi + jyq, x c = x 7 + j* Q , 

n c = n/ + jn Q , H c = H/ + j'Hq. (4) 
Further, we define H r G R 2Jv ^>< 2fc , y r g R2JV rP xi ; Xj , e 



p2fcxl 



H r 



>2iV r pxl 

H/ - Hq 
Hq H/ 

x r = [xf x£] T , n r = [nj n£] T . 



r 1 1 ri 

yr = [yi y Q \ , 



(5) 
(6) 



y r = H r x, 



(7) 



Henceforth, we work with the real-valued system in (0. For 
notational simplicity, we drop subscripts r in (0 and write 



Hx 



(8) 



where H' 

m>2k x 1 



H r G 



t2JV r px2fe 



y = y r G R 2Ar " pxl ,x = x r G 
and n = n r G R 2NrP * 1 . We assume that the channel 
coefficients are known at the receiver but not at the transmit- 
ter. Let Aj denote the il/-PAM signal set from which Xi (ith 
entry of x) takes values, i = 0, • • • , 2k — 1. Now, define a 
2fc-dimensional signal space § to be the Cartesian product of 
Aq to A2fc_i. The ML solution is then given by 



whose complexity is exponential in k 



7 e m g n d T (HrH'd-2y r H'd, (9) 



A. Full-rate Non-orthogonal STBCs from CDA 

We focus on the detection of square (i.e., n=-p~ N t ), full- 
rate (i.e., k=pn = Nf), circulant (where the weight matrices 
A c 's are permutation type), non-orthogonal STBCs from 
CDA HI, whose construction for arbitrary number of trans- 
mit antennas 71 is given by the matrix in Eqn.(9.a) given at the 
bottom of this page. In (9. a), u>„ = e » , j = v— 1, and d u ,v, 
< u, v < n — 1 are the n 2 data symbols from a QAM alpha- 
bet. When 8 = t = 1, the code in (9. a) is information lossless 
(ILL), and when 5 = and t = e^, it is of full-diversity 
and information lossless (FD-ILL) (TJ. High spectral efficien- 
cies with large n can be achieved using this code construction. 
However, since these STBCs are non-orthogonal, ML detec- 
tion gets increasingly impractical for large n. Consequently, a 
key challenge in realizing the benefits of these large STBCs in 
practice is that of achieving near-ML performance for large 77 
at low decoding complexities. The BER performance results 
we report in Sec. [TV] show that the PDA-based decoding al- 
gorithm we propose in the following section essentially meets 
this challenge. 



III. Proposed PDA-Based Decoding 

In this section, we present the proposed PDA-based decod- 
ing algorithm for square QAM. The applicability of the al- 
gorithm to any rectangular QAM is straightforward. In the 
real-valued system model in ([8]), each entry of x belongs to a 
s/M -PAM constellation, where M is the size of the original 
square QAM constellation. Let bf 1 ^, , ■ ■ ■ , b[ q denote 
the q = log 2 (VM) constituent bits of the ith entry Xi of x. 
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We can write the value of each entry of x as a linear combi- 
nation of its constituent bits as 



9-1 
3=0 



U) 



0, ,2k- 1. (10) 



Letb G {+1, -l} 2 « fcxl , defined; 



A 



1,(0) ,(g-l),(0) Aq-1) AO) ,(9-1) 
u ' ' ' °0 °1 ' ' ' °2fc-l ' ' ' °2fe-l 



(ID 

denote the transmitted bit vector. Defining c = [2° 2 1 ■ • 2 q ~ 1 ], 
we can write x as 



Gaussian, and hence y is Gaussian conditioned on bf\ Since 
there are 2qk — 1 terms in the double summation in ( fT5] l, this 
Gaussian approximation gets increasingly accurate for large 
N t (note that k = N^). Since a Gaussian distribution is fully 
characterized by its mean and covariance, we evaluate the 
mean and covariance of y given of = +1 and b± = — 1. 
For notational simplicity, let us define p{ + = P(b\^ = +1) 



and p 3 , = P(b\ 



-1). It is clear thatp^ + + pP i 



Let/xf = E(y\bf> = +1) and y?r = E(y|&V" = -1), 
where E(.) denotes the expectation operator. Now, from ( fT5] l, 
we can write as 



= (I®c)b, 



(12) 



2fe-l 9-1 



where I is the 2k x 2k identity matrix. Using ( [T2"l i. we can 
rewrite ([8]l as 



H'(Ig)c) b + n, 



(13) 



H 



where H G j^2Af r px2qfe j s ^ e ff ec tive channel matrix. Our 
goal is to obtain b, an estimate of the b vector. For this, we 
iteratively update the statistics of each bit of b, as described 
in the following subsection, for a certain number of iterations, 
and hard decisions are made on the final statistics to get b. 

A. Iterative Procedure 

The algorithm is iterative in nature, where 2qk statistic up- 
dates, one for each of the constituent bits, are performed in 
each iteration. We start the algorithm by initializing the a pri- 
ori probabilities as P(b ( / ] = +1) = P{bf ] = -1) = 0.5, Vi = 
0, • • • ,2k — \ and j = 0, • • • , q— 1. In an iteration, the statis- 
tics of the bits are updated sequentially, i.e., the ordered se- 
quence of updates in an iteration is {b^ ,b q , , 

&2fc_i > ' ' ' t>2k-i } ■ Th e steps involved in each iteration of the 
algorithm are derived as follows. 

The likelihood ratio of bit of in an iteration, denoted by Af , 
is given by 



A 



P(6p } = +l|y) 



■i|y 



a) 



P(y\k 



u) _ 



-l) p(b[ 3) = +i 



P(y\b. 



0) 



P{bP = -1) 



(14) 



,U) 



Denoting the tth column of H by h t , we can write dT3l l as 

2A.--1 9-1 

y = h qi+j b\ j) +Y^ E \i+rrA m) +n, (15) 

1 = m=0 

m=£q(i-l)+j 



where n G R 2A ^p x1 is the interference plus noise vector. 

(i) ~ 
To calculate pr , we approximate the distribution of n to be 



.3 + 



\>+3 + E E Kl+m(2pT + - !)■ (16) 

1=0 m=0 

m=£q(i-l)+j 



Similarly, we can write pb\ as 

2fe-l 9-1 

*«r = -Wi+E E h ^™( 2 rf n+ -i) 

1 = m=0 

m^q(i-l)+j 

= Vi + - 2h gl+J . 

Next, the 2N r p x 2N r p covariance matrix C{ of y given b^ 
is given by 



(17) 



E 



2fc-l 9-1 



n+E E h H+m(b 



(m) 



2P, 



1 = m. = 

m^q(i-l)+j 



2fc-l 9-1 
n+E E h 9/+™( 6 l 



2p? 



1) 



1 T 



=0 m=a 
mytq(i-l)+j 



(18) 



Assuming independence among the constituent bits, we can 
simplify C\ in (fT8l as 

2fe-i 9-1 

C{ = a 2 I + E E *V+™ h £+™ 4 P r + (l - P^)- d9) 

1 = m = 

m^q(i—l)+j 

Using the above mean and covariance expressions, we can 
write the distribution of y given b\^ = ±1 as 

e -(y-M! ± ) T (c|)- 1 (y-Mf ±> 



P(y\b\ j) = ±1) 
Similarly, P(y\b\ j) 



P{y\bf ] = -I) 



(2n)»rP\Cl\i 
-1) is given by 

P -(y-MD T (^)- 1 (y-Atr 



(20) 



Using ( f20l > and ( |2TI ). can be written as 

, P(y]6^ = +1) 

P(y|6« = -1) 
= -((y-rt+flc^Hy^lVly-fiD^crHy-fT)) 



(21) 



(22) 



Using af' and /?| , A 1 - is computed using (TBt . Now, using 
the value of Aj , the statistics of tyf' is updated as follows. 
From (TT4b . and using P(b l (j) = +l|y) + P(6p' ) = -l|y) = 1, 
we have 



and 



Li) 



-i|y) = 



A 



(./) 



1 + A 



(23) 



of the old D matrix. Therefore, using the matrix inversion 
lemma, the new D -1 can be obtained from the old D _1 as 



D 



where 



D 



(28) 



P(b^ = -i|y) = 



l + A 



(./) 



(24) 



As an approximation, dropping the conditioning on y, 



and 



P{b^ = +1) 



P(b? = -1) 



A 



1 + A 



1 + A 



0') 



(25) 



(26) 



Using the above procedure, we update P(b^ — +1) and 
P(b^ = -1) for all i = 0, ■ ■ ■ , 2k - 1 and j = 0, • • • , q - 1 
sequentially. This completes one iteration of the algorithm; 

i.e., each iteration involves the computation of a\ and equa- (P + QRS) : =P 1 P X Q(R 1 + SP 1 Q) 1 SP . (31) 



where pl + and are the new (i.e., after the update in 

( |25] |) and ( f26b ) and old (before the update) values, respec- 
tively. It can be seen that both the numerator and denomina- 
tor in the 2nd term on the RHS of (f28b can be computed in 
0(N 2 p 2 ) complexity. Therefore, the computation of the new 
D 1 using the old D 1 can be done in 0(N 2 p 2 ) complexity. 

Computation of (C^) _1 : Using d2Tb and (|T~9b , we can write 
Q\ in terms of D as 

= D-4^ + (l-pf)h^h£ + ,. (30) 

We can compute (C^) _1 from D 1 at a reduced complexity 
using the matrix inversion lemma, which states that 



tions (O, CE), (HU), <|22), (Hi, (ESI, and <|26) for all 
The updated values of P(6 2 0) = +1) and P{b U) = -1) in 
d25l l and d26l ) for all i, j are fed back to the next iteratiorH 
The algorithm terminates after a certain number of such iter- 
ations. At the end of the last iteration, hard decision is made 
on the final statistics to obtain the bit estimate as +1 if 
Ap-* > 1, and —1 otherwise. In coded systems, Ap^'s are fed 
as soft inputs to the decoder. 

B. Complexity Reduction 

The most computationally expensive operation in computing 

(i) 

f}\ is the evaluation of the inverse of the co variance matrix, 
C-f, of size 2N r p x 2N r p which requires 0(N 3 p 3 ) complex- 
ity, which can be reduced as follows. Define matrix D as 



2fe-l q-1 

; ff2l +E E h 9;+ „x +m 4 P r + (i 



Pi 



(27) 



At the start of the algorithm, with p{ + and p\ J ' initialized to 
0.5 for all i, j, D becomes a 2 I + HH T . 

Computation of D _1 : We note that when the statistics of 
is updated using ( f25l l and ( 1261 . the D matrix in (l27b 
also changes. A straightforward inversion of this updated D 
matrix would require 0(N 3 p 3 ) complexity. However, we 
can obtain the D _1 from the previously available D 1 in 
0(N 2 p 2 ) complexity as follows. Since the statistics of only 

(i) 

b\ ' is updated, the new D matrix is just a rank one update 

2 The computation of the statistics of a current bit in an iteration makes 
use of the newly computed statistics of its previous bits (as per the ordered 
sequence of statistic updates) in the same iteration and the statistics of its 
next bits available from the previous iteration. 



,0') 



Substituting P 2 N rP x2N r p = D, Q2N rP xi = R-ixi 
-4^ + (l-pJ + ),andSi> 



(CD 



D 



lx2N r p 

D 1 h 



hqi+j in <ED, we get 
v, T n 1 

Qi+3 lL gi+j 



UT TA-1 h 1 



(32) 



which can be computed in 0(N 2 p ) complexity. 

Computation of and ': Computation of pf' involves 
the computation of (P { + and also. From (fTTh . it is clear 
that fi-P can be computed from with a computational 
overhead of only 0(N r p). From dT6b . it can be seen that com- 
puting //| + would require 0(qN r pk) complexity. However, 
this complexity can be reduced as follows. Define vector u as 



2fc-l q-1 



1=0 m=0 

Using (fTSI l and ((33), we can write 



P 



3 + 



u + 2(1 -p, 



3 + \ 



(33) 



(34) 



u can be computed iteratively at 0{N r p) complexity as fol- 
lows. When the statistics of b\ is updated, we can obtain the 
new u from the old u as 



u + 2(pf-p(+ d )h„i +J -, 



(35) 



whose complexity is 0(N r p). Hence, the computation of 
fp^ in (l33l and fP t ~ in ((34) needs 0(N r p) complexity. The 
listing of the proposed PDA algorithm is summarized in the 
Table-I in the next page. 



1,J' = 0,1, 
HH T + a 2 l)-\ 



Table-I: Proposed PDA-based Algorithm Listing 

Initialization 

1. pl + = p J ~ = 0.5, A< J) = 1, 

Vi = 0, 1, • • • 2k 

2. u = 0, D 1 

3. numjter: number of iterations 

4. K = 1; k is the iteration number 
Statistics update in the nth iteration 

5. for i = to 2k - 1 

6. for j = to q — 1 
Update of statistics of bit b\ 

7. /ij' + = u + 2(l-pl + )h qi 



I 



9- (CD 



3-1-1 



D 



D 



^gi + j n qi + j 



D 



D 



((y-M^ + ) T (cJ) _1 (y-Mr)-(y-/ i r) T ( c ^ _1 (y-Mr)) 
i- 



10. $ = e" 



n.p i+ . 



12. = - 



pi + 



13. Ap' = (5. 

14. pf 



,0) a O) 



Pi,old 



1>) 



1+A\- 



Update of u and D 1 

15. u «- u + 2(^ + -^'+ 

16. 77 = 1//; ' ( I /<; ' 



°i,old) 



17. D 



D 



-i _ D' 1 N.+ J ^, +J D ' 1 



,2fc-l, J = 0, 1, - • • ,q 
Vi = 0, ,2fc- 1 



18. end; End of for loop starting at line 5 

19. if (ft = numSter) goto line 21 

20. « = K + 1, goto line 5 
21.6^ =sgn(log(Ap ) )) 

Vi = 0,l,--- 

22. %=YX^ 2i% f ■ 

23. Terminate 

C. Overall Complexity 

We need to compute HH T at the start of the algorithm. This 
requires 0(qkN^p 2 ) complexity. So the computation of the 
initial D -1 in line 2 requires 0(qkN^p 2 ) + 0(N^p 3 ). Based 
on the complexity reduction in Sec. IIII-BI the complexity in 
updating the statistics of one constituent bit (lines 7 to 17) is 
0(N 2 p 2 ). So, the complexity for the update of all the 2qk 
constituent bits in an iteration is 0(qkN 2 p 2 ). Since the num- 
ber of iterations is fixed, the overall complexity of the algo- 
rithm is 0(qkN 2 p 2 ) + 0(N^p 3 ). For N t = N r , since there 
are k symbols per STBC and q bits per symbol, the overall 
complexity per bit is 0(p 2 N 2 ). 

IV. Results and Discussions 

In this section, we present the simulated uncoded/coded BER 
of the PDA algorithm in decoding non-orthogonal STBCs 
from CDA0. Number of iterations in the PDA algorithm is 
set to m = 10 in all the simulations. 

3 Our simulation results showed that the performance of FD-ILL (S = 
e^i,t = e"J) and ILL (8 = t = 1) STBCs with PDA decoding were 
almost the same. Here, we present the performance of ILL STBCs. 
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Fig. 1 

Comparison of uncoded BER of PDA and LAS algorithms in 
decoding 4 x 4, 8 x 8, 16 x 16 ILL STBCs. Nt = N r , 4-QAM. # 

ITERATIONS m = 10 FOR PDA. MMSE INITIAL VECTOR FOR LAS. 
BER improves for increasing STBC sizes. With 4-QAM. PDA and LAS 
algorithms achieve almost the same performance. 



PDA versus LAS performance with 4-QAM: In Fig. Q] we plot 
the uncoded BER of the PDA algorithm as a function of aver- 
age received SNR per rx antenna, 7, in decoding 4 x 4, 8 x 8, 
16 x 16 STBCs from CDA with N t = N r and 4-QAM. Perfect 
channel state information at the receiver (CSIR) and i.i.d fad- 
ing are assumed. For the same settings, the performance of 
the LAS algorithm in |[13)-[[I3 with MMSE initial vector are 
also plotted for comparison. From Fig. Q] it is seen that 

< the BER performance of PDA algorithm improves and 
approaches SISO AWGN performance as Nt = N r is 
increased; e.g., performance close to within about 1 dB 
from SISO AWGN performance is achieved at 10~ 3 un- 
coded BER in decoding 16 x 16 STBC from CDA having 
512 real dimensions, and this illustrates the ability of the 
PDA algorithm to achieve excellent performance at low 
complexities in large non-orthogonal STBC MIMO. 

< with 4-QAM, PDA and LAS algorithms achieve almost 
the same performance. 

PDA versus LAS performance with 16-QAM: Figure|2]presents 
an uncoded BER comparison between PDA and LAS algo- 
rithms for 16 x 16 STBC from CDA with N t = Nr = 16 
and 16-QAM under perfect CSIR and i.i.d fading. It can be 
seen that the PDA algorithm performs better at low SNRs than 
the LAS algorithm. For example, with 8x8 and 16 x 16 
STBCs, at low SNRs (e.g., < 25 dB for 16 x 16 STBC), PDA 
algorithm performs better by about 1 dB compared to LAS 
algorithm at 10~ 2 uncoded BER. 

Turbo coded BER performance of PDA: Figure [3] shows the 
rate-3/4 turbo coded BER of the PDA algorithm under per- 
fect CSIR and i.i.d fading for 12 x 12 ILL STBC with N t = 
N r — 12 and 4-QAM, which corresponds to a spectral effi- 
ciency of 18 bps/Hz. The theoretical minimum SNR required 
to achieve 18 bps/Hz spectral efficiency on a Nt = N r = 12 




Average Received SNR (dB) 



Fig. 2 

Comparison of uncoded BER of PDA and LAS algorithms in 
decoding 4 x 4, 8 x 8, 16 x 16 ILL STBCs. N t = N r , 16-QAM. # 

ITERATIONS m = 10 FOR PDA. MMSE INITIAL VECTOR FOR LAS. 
With 16-QAM, PDA performs better than LAS at low SNRs. 

MIMO channel with perfect CSIR and i.i.d fading is 4.3 dB 
(obtained through simulation of the ergodic capacity formula 
J3])- From Fig. [3] it is seen that the PDA algorithm is able 
to achieve vertical fall in coded BER within about 5 dB from 
the theoretical minimum SNR, which is a good nearness to 
capacity performance. 

Iterative PDA Decoding/Channel Estimation: We relax the 
perfect CSIR assumption by considering a training based it- 
erative PDA decoding/channel estimation scheme. Transmis- 
sion is carried out in frames, where one N t x Nt pilot ma- 
trix (for training purposes) followed by Nd data STBC ma- 
trices are sent in each frame as shown in Fig. |4] One frame 
length, T, (taken to be the channel coherence time) is T = 
(Nd + l)Nt channel uses. The proposed scheme works as fol- 
lows 01611 : i) obtain an MMSE estimate of the channel matrix 
during the pilot phase, ii) use the estimated channel matrix 
to decode the data STBC matrices using PDA algorithm, and 
Hi) iterate between channel estimation and PDA decoding 
for a certain number of times. For 12 x 12 STBC from CDA, 
in addition to perfect CSIR performance, Fig. [3] also shows 
the performance with CSIR estimated using the proposed it- 
erative decoding/channel estimation scheme for Nd = 1 and 
Nd = 8. 2 iterations between decoding and channel esti- 
mation are used. With Nd = 8 (which corresponds to large 
coherence times, i.e., slow fading) the BER and bps/Hz with 
estimated CSIR get closer to those with perfect CSIR. 

Effect of Spatial MIMO Correlation: In Figs. [T]to[3] we as- 
sumed i.i.d fading. But spatial correlation at transmit/receive 
antennas and the structure of scattering and propagation en- 
vironment can affect the rank structure of the MIMO channel 
resulting in degraded performance ifTTI . We relaxed the i.i.d. 
fading assumption by considering the correlated MIMO chan- 
nel model in fl8l . which takes into account carrier frequency 
(/ c ), spacing between antenna elements (d t , d r ), distance be- 
tween tx and rx antennas (R), and scattering environment. In 
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Fig. 3 

Turbo coded BER of the PDA algorithm in decoding 12 x 12 

ILL STBC WITH Nt = N r , 4-QAM, RATE-3/4 TURBO CODE, 18 
BPS/HZ AND m = 10 FOR i) PERFECT CSIR, AND ii) ESTIMATED CSIR 

USING 2 ITERATIONS BETWEEN PDA DECODING/CHANNEL 
ESTIMATION. With perfect CSIR, PDA performs close to within about 5 dB 
from capacity. With estimated CSIR, performance approaches to that with 
perfect CSIR with increasing coherence times. 
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Fig. 4 

TRANSMISSION SCHEME WITH ONE PILOT MATRIX FOLLOWED BY N d 
DATA STBC MATRICES IN EACH FRAME. 



Fig. [5] we plot the BER of the PDA algorithm in decoding 
12 x 12 STBC from CDA with perfect CSIR in i) i.i.d. fad- 
ing, and ii) correlated MIMO fading model in ifTHl . It is seen 
that, compared to i.i.d fading, there is a loss in diversity or- 
der in spatial correlation for N t — N r ~ 12; further, use of 
more rx antennas (N r = 18, Nt = 12) alleviates this loss in 
performance. We can decode perfect codes Ifl9l , l20| of large 
dimensions also using the proposed PDA algorithm. 
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